OPEN GAccEss Freely available online 


) Journal of Clinical Research & Bioethics 


Review Article 


Interdisciplinary Remarks on the Assessment List on Trustworthy AI 
(ALTAI) Applied to a P5 Medicine Tool 


Denise Amram", Arianna Cignoni’, Tommaso Banfi’, Gastone Ciuti? 


'Lider-Lab, Dirpolis Institute, Scuola Superiore Sant’Anna, Pisa, Italy;’Department of Excellence in Robotics & AI, Scuola Superiore 


Sant'Anna, Pisa, Italy 


ABSTRACT 


The authors apply the Assessment List on Trustworthy AI (ALTAI) to a possible AL-based tool aiming at supporting 
the melanoma cancer diagnosis. They take the opportunity to provide an interdisciplinary analysis of the proposed self 
assessment tool in light of its possible mandatory application in R@D&I. The presented empirical exercise highlights 
some pros and cons of the adopted checklist, stimulating further remarks on the EU regulatory initiatives on AI. 


Finally, we try to understand the improvement of procedures, medical knowledge and treatment collected and improved 
during these months, that allowed for a lower mortality rate in the referring period. 
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INTRODUCTION 


In the last months, the EU accelerated the legislative process on 
Artificial Intelligence (AI). In July 2020, the High-Level Expert 
Group on Artificial Intelligence (AI HLEG) presented the final 
Assessment List for Trustworthy Artificial Intelligence (ALTAI). 
While, in October 2020, the European Parliament adopted a 
“Provisional text on the Framework of ethical aspects of artificial 
intelligence, robotics and related technologies”, where the risk- 
based approach is confirmed as main strategy for the further 
Al legislative initiatives that the EU Commission is working on 
(hereinafter “Provisional Resolution”) [1]. 


LITERATURE REVIEW 


From this perspective, a deep analysis of the ALTAI methodology 
and structure becomes crucial to address in a responsible and 
proactive way the current compliance challenges for those who 
develop Al-based systems. ALTAI consists of a series of questions 
that may steer the AI designers (rectius the ALcontroller, or 
developers according to the mentioned Provisional Resolution) 
towards a multidisciplinary evaluation path aimed at addressing 
the seven ethical-legal-safety compliance challenges emerging by the 
Guidelines on Trustworthy AI, adopted by the EU Commission in 
April 2019 [2,3]. 


According to them, an AI system becomes trustworthy whereas it is 
lawful (i.e., compliant with the applicable legal framework), ethical 
(i.e., compliant with the applicable ethical framework) and robust 
(i.e., compliant with the applicable safety standards). The interplay 


between these three pillars is determined by the following seven 


grounds [4,5]. 
Human agency and oversight 


It includes both the ethical and the legal dimension as it refers to 
fundamental rights protection aimed at maintaining the balance 
between human control and technical progress in terms of human 
agency and oversight. Human beings shall be protected both as 
individuals and groups, taking into account inclusiveness, fairness, 
non-discrimination and vulnerabilities protection as paramount 
interests. 


Technical robustness and safety 


It refers to the system resilience to attacks and security, including 
fall back plans and the compliance with the highest levels of general 
safety, accuracy, reliability, and reproducibility. 


Privacy and data governance 


This profile establishes a bridge with the most effective compliance 
process by design and by default introduced for personal data 
processing by the EU Reg. n. 2016/679 on General Data 
Protection Regulation (GDPR) aiming at guaranteeing the respect 
for confidentiality, quality, and integrity of data. 


Transparency 


This is a principle established to guarantee the traceability, 
explainability, communication of methods, goals, and results of 
the given AI system. 
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Diversity, non-discrimination, and fairness 


This ground refers to the interdisciplinary safeguards to be 
implemented in order to avoid a misuse or an unfair use of AI in 
terms of bias, accessibility, and universal design. 


Societal and environmental well-being 


The AI system shall be put in the market as a sustainable solution 
under the environmental, social, and societal perspectives, 
considering the democratic values rooted within the EU framework. 


Accountability 


This is the main principle that enables the compliance process in 
terms of proactively responsibilities allocation through a risk-based 
approach that includes auditability, minimization and reporting of 
negative impact, trade-offs, and redress. 


In order to facilitate to undertake the ALTAI checklist, a web-based 
tool that provides results in terms of level of compliance and list of 
recommendations has been developed. This could be followed by 
anyone who is responsible to design and develop an AI system to 
confidentially perform a self-assessment. This preparatory tool can 
be interpreted as a pilot for possible future obligations on the topic, 
as the risk-based approach addressed through an impact assessment 
seems to be suggested by the abovementioned EU Parliament 
Resolution as well. 


In the following paragraphs, we will assess the ALTAI checklist 
on possible design of Al-tools applied to healthcare, considering 
the paramount role that the health-data debate has among the 
European strategy for data and the related investments to encourage 
the upscaling of cross-border exchange of health data and their re- 
use to improve the detection, diagnosis, and treatment of diseases 


[6]. 


In particular, we will provide some methodological remarks 
emerging from the application of the ALTAI to an ALbased tool 
developed through a Predictive, Preventive, Personalised, and 
Participatory (P4) cancer medicine approach, in order to contribute 
to the debate from a bottom up and interdisciplinary perspective. In 
fact, within this empirical assessment, our efforts focused on both 
explaining the tool design from a scientific viewpoint, covering 
engineering, ethical-legal, and medical aspects and on introducing 
technical and organizational enablers to allow the real participation 
of unexpert users in the prevention of specific pathologies. 


According to the main digitalization challenge of healthcare, we 
addressed our remarks to boost outcomes personalization features- 
based on the user/patient ones-starting from the contribute that 
high level datasets of users’/patients’ health-data could bring to 
the enhancement of the healthcare sector. Sensitive health-data 
processing, indeed, plays a crucial role in the development and 
deployment of these specific tools that impact both on patients 
as a vulnerable group, and on individuals. In the event that the 
ALbased technologies would support wrong evaluations or would 
provide non-compliant data management strategies, they could 
lead to misjudgments and adverse outcomes. Consequences 
might be envisaged not only in terms of physical harm for a given 
patient, but they could also affect the psychological dimension of 
a given patient/end user. For this reason, our remarks deal with a 
further challenge, as hereinafter we will refer to the P5 medicine, 
including “psycho-cognitive” aspects arising from the commented 
technologies [7]. 
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In this paper, therefore, we will deal with common issues - 
ranging from ethical-legal to technical ones-to properly address the 
compliance activities related to the technical, as well as ethical-legal, 
challenges emerging in the context of AL-based tools development 
for the implementation of an effective P5 medicine. At the same 
time, we will suggest possible solutions to cover gaps and lacks 
emerging from the available ethical-legal assessment tools. 


PRELIMINARY ACTIVITIES TO PERFORM A 
RESPONSIBLE ALTAI 


ALTATI is a method to drive the self-assessment and it is based on 63 
questions divided into the above-illustrated seven key-requirements. 
Interdisciplinary expertise is required to answer since the very 
first questions. A close yes/no answer is not sufficient, in fact, to 
perform the analysis and achieve a conscious and resolute opinion 
about the level of trustworthiness of the designed tool. The ALTAI 
is a method to drive the self-assessment [9]. 


For example, the first block of questions (Q1-Q6.2) is focused on 
assessing the impact of the designed application on fundamental 
rights. Therefore, a deep knowledge of what is meant for fundamental 
rights protection and how to assess the corresponding impact 
shall be introduced in the evaluation workflow. Furthermore, to 
properly answer the second block (Q7-Q18.5), a deep knowledge 
of cybersecurity and safety standards is required as, again, yes/ 
no answers are not sufficient to the whole trustworthy evaluation 
process. In particular, the Al-designer shall justify the reasons that 
bring to choose a given technical measure as well as to implement 
a safeguard instead of another one under several grounds, 
including human safety, animal protection, environment, security, 
and misuse. Within the same context, it is required to identify a 
“fallback plan” aimed at ensuring the maintenance of an acceptable 
level of risks also if something goes wrong. Those evaluations are 
functional also in a prognostic perspective to answer questions 
included in the sixth block (Q49-Q53), related to the societal and 
environmental wellbeing. Therefore, the trustworthiness combines 
the need to prevent harms as well as to address the current multi- 
faced challenges of societal empowerment [10-12]. 


The third block (Q19-Q29.3) recalls the GDPR compliance and 
the results of the data protection assessment performed under 
article 35 GDPR for the given personal data processing shall be 
confirmed within the more comprehensive framework of the AI- 
based system. In this context, the Data Protection Officer (DPO) 
involvement is not only suggested, but also an organizational 
measure to be assessed in order to reach the trustworthy standard 
of the developed ecosystem. As a preliminary activity, the AI 
designer/developer has to verify whether or not a data protection 
officer and/or a privacy expert shall be consulted and appointed. 
This profile is strictly connected with the first block as the impact 
on the other fundamental rights than data protection reasonably 
stands in a cause-effect relationship with the protection of the 
confidentiality, availability, and integrity of personal data. For 
instance, a data breach concerning an AL-application for patients 
could also infringe their health, or private life, dignity, etc., but it 
could also identify possible grounds of discrimination according to 
the end users’ vulnerability. This last profile on fairness is addressed 


by the fifth block (Q41-Q48) of the checklist assessment [12]. 


The fourth block (Q30-Q40.2) on transparency consists of a series 
of questions related to the development of each AL-based system, 
considering it as the result of a series of human decisions taken by 
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the AI controller. In a binding legal framework that defines roles 
and responsibilities, he/she may assume the role of AL-controller, 
similarly to the data controller as defined by article 5 GDPR. AL 
based systems, in fact, firstly include the identification of methods 
for data acquisition regarding the function/algorithm that must 
be applied to a previously determined dataset. Secondly, the AL 
controller shall define what tasks shall the AI perform (the so- 
called required actions) as well as the final purposes (goals) of the 
automated decision making/reasoning activities [13]. 


The last block (Q54-Q63) will assess the overall level of 
accountability of the process, suggesting the implementation of 
organizational measures aimed at monitoring the process and also 
providing solutions in case of issues. 


A first step to be performed by the AI controller/developer is to 
engage an interdisciplinary team aimed at strengthening a dialogue 
and share best practices for the ALTAI purposes. Furthermore, 
an independent advisor might support the assessment in order to 
interpret notions and corresponding adjectives (e.g., what is meant 
for “meaningful interactions and appropriate human oversight and 
control” in Question n. 44, “an adequate working definition of 
fairness” in Question n. 44, “wide range of individual preferences” 
in Question n. 45). 


Secondly, the identified specific parameters and standards shall 
drive the overall assessment and provide a robust and coherent 
internal framework of reference. This is true also for non-technical 
requirements, whose harmonized application shall emerge from the 
individual answers given to each block. For instance, if we identify 
a ground of vulnerability in respect to the impact on fundamental 
rights, the same analysis shall be reproduced within the assessment 
of fairness as well as within the governance-related issues and social 
impact ones [14,15]. 


Once that the technical and organizational measures to reach the 
acceptable level of trustworthiness of the Al-ecosystem have been 
implemented, a continuous evaluation system shall be maintained 
in order to ensure upgrades both in terms of performance 
and enhancement of fundamental rights. To this end, proper 
mechanisms of check and balance could be introduced within 
codes of conducts, encouraged by article 40 GDPR for personal 
data processing. They currently provide a compliance support for 
small and medium enterprises, addressing common issues and 
challenges in terms of self-regulation and best practices not only 
for personal data processing, but for the development of Al-based 
systems as well [16]. 


These preliminary remarks shall orient the Al-controller/developer 
towards a legal attentive design of the given application. 


In the following paragraphs, we will present the results of a 
discussion between an interdisciplinary group, including scholars 
in law, biomedical engineering, and computer science, on how to 
develop a trustworthy Al-based tool aimed at early detecting the 
melanoma skin cancer. 


The digitalization of the healthcare services, in fact, is boosted 
where existing data flows can be re-usable to train (such as in our 
case-study) an algorithm that could process information in order 
to predict a decision. These tools shall be enabled within a robust 
data governance ecosystem aimed at enabling the healthcare 
service among stakeholders and, at the same time, ensuring the 
exercise of their rights. Data collection, processing, and storage 
shall, therefore, allow mechanisms of re-training of the algorithms, 
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while providing a specific prediction/decision making result 
for the user/users. The interoperability for data formats and the 
enhancement of data security shall be combined with acceptable 
levels of pseudonymisation and anonymisation for the training, as 
well as on the linking and de-linking of records for the given query. 


These remarks become crucial to enable innovative solutions, care 
delivery, including the opportunity for patients to control and 
administer care themselves. Considering that the P5 medicine 
includes keywords as participative, preventive and personalised, 
highlighting how the participation of the patient is paramount to 
prevent adverse outcomes in pathologies development, our aim is 
to assess the ALTAI while assessing a given technology in a mutual 
exchange of technical and organizational good practices. Under 
the purpose to overcome possible practical issues emerging from 
this innovative mechanism of assessment and, at the same time, to 
properly address the ethical-legal compliance in R@D&I sectors, 
we will highlight weaknesses and strengthens of the proposed 
structured evaluation system [17]. 


BUILDING UP A TRUSTWORTHY AI-BASED 
TOOL FOR P5 MEDICINE 


Our analysis starts from the need to develop an ethical-legal by 
design and by default AL-based tool to support the early-detection 
of melanomas. Melanoma has the highest mortality rate among 
skin cancers, and it can grow from the early stage (called melanoma 
insitu) to the latest stage (metastatic melanoma) in a period 

ranging from 8 to 12 months. An early diagnosis is essential to 
improve the survival rate and reduce treatments costs. Since 
this pathology appears on the skin surface, it can be detected by 
monitoring changes of the skin itself. This condition has peculiar 
morphological attributes and an expert clinician is needed to make 
a diagnosis. Nevertheless, melanoma lesions are not easy to detect 
in its early stage when these lesions present borderline features, 
even to an expert eye. AI technology may aid clinicians in the 
melanoma diagnosis, especially in its earlier stage. The use of the 
smartphone that has become a large-scale deployable tool, paired 
with this kind of technology may enable a common user to take an 
active and participative role into the skin cancer prevention [18-22]. 


As anticipated, despite of the internal allocation of technical 
tasks, the ALcontroller/developer shall identify a series of roles 
aimed at giving advice and being responsible of some specific 
tasks within the ALTAI context. The first preliminary issue is the 
allocation of human resources in the design process. From a law 
and policy making perspective, indeed, incentive mechanisms shall 
be identified in order to overcome the lack of effectiveness of a 
(still) unbinding approach. In this regard, the GDPR legislative 
model seems particularly effective, as it includes a series of 
binding obligations framed within a structured illustration of 
principles, roles, and enforcing tools. In addition, a frame of 
optional mechanisms and safeguards that each legal system may 
decide to regulate or not. The second transversal issue is related to 
establishment of a multi-level governance system. This is functional 
to identify a monitoring process and coordinate the engagement of 
the previously identified roles. 


In the analyzed P5 medicine tool, a technical board that includes 
medical doctors, biomedical engineers, software engineers, data 
protection and ethics experts shall be established to interpret 
the different interdisciplinary key-requirements that the ALTAI 
assessment provides. This technical board shall be able to arise 
issues and identify solutions in light of a general mutual purpose 
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to co-create a new technology enhancing fundamental rights 
protection. Such a technical board could also host different 
expertise for external advice as well as include stakeholders and 
end-users’ opinions to assess different solutions. 


Furthermore, the multilevel governance shall comply with the 
applicable legal framework that means for example that the GDPR 
governance in terms of appointment of joint controllers and data 
processors shall be structured according to the security governance 
determined by the specific standards followed by the developer. 
In addition, further engagements are envisaged by each block of 
questions. This may contribute to share responsibilities and to 
prove the overall accountability within the process. 


The framework becomes more complex in case of public/private 
stakeholders as well as in case of cross-border relationships between 
the identified players since the national compliance process may 
present some gaps/overlapping profiles. In this regard, the possible 
legislative misalignment shall be addressed and covered during the 
assessment by specific agreements between the involved parties. 


The third step is end-users centered. In fact, compliance activities 
shall deal with the main features that characterize those persons 
or groups of persons whose data are processed (i.e., the data 
subjects under the GDPR) and those who are the addressees of the 
prediction/automated decision-making process. The two categories 
might sometimes overlap, but they usually do not. The ALTAI 
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process shall deal with all possible end-users both to protect and 
enhance their rights. To this end, the technical board shall open 
to collect feedback as well to include a validation step with end- 
users to collect feedback not only on the technical level related to 
the automated decision-making, but also on its usability. In fact, in 
our example an ALbased tool will be addressed both to clinicians 
and patients with evident differences in terms of awareness, risks, 
benefits, and impact on corresponding rights. For instance, health 
protection will be assessed in terms individual fundamental right 
for each patient and in the collective dimension as far as the 
clinicians are concerned. At the same time, the supportive role of 
the AI shall be properly addressed in order to do not make the 
human decision too overconfident or, on the contrary, providing a 
replacement distress among professionals. Once that these profiles 
have been addressed as priorities by the Al-controller/developer, 
the ALTAI might be filled. 


ALTAI GAPS AND STRENGTHS 


Questions developed within the ALTAI checklist have been 
interpreted to design an Al-based tool aimed at early-detecting 
melanoma by a decision-making system that processes images, in 
order to highlight gaps and strengths of the checklist. 


The chart below shows the results of the ALTAI analysis and 
possible comments that may either address good practices or issues 


(Table 1). 


Table 1: Results of the ALTAI analysis and possible comments. 


Human Agency and oversight Answer 


Comments 


Fundamental rights 
health, data protection. 


Clinicians fundamental rights involved are dignity 


and work-life. 


Patients fundamental rights involved are dignity 


This key-requirement arises the need to identify the list of 
fundamental rights. 

The list could be included within the Terms & Conditions 

of the given system/device/tool to accomplish an extensive 


The decision-making process could interact with 
patients stimulating their awareness towards the risk 
of melanoma, suggesting contacting a clinician. 
The tool could recommend the patient to get clinical 
advice and therefore it could interfere patient's deci- 
sion to get or not to get a clinical advice. 
Individual vulnerabilities shall be addressed in the 
information section before using the tool. 

The clinicians may be supported in the pre-screening 
activities, but diagnosis shall be performed under the 
current clinical practice. 


information duty. 
A possible survey could be implemented to identify whether 
or not the end-user is vulnerable as well as her/his attitudes 
towards the results of the decision-making system (e.g., some- 
one who assumes drugs could be temporary vulnerable and 
the use of the tool may cause undesirable consequences). 
Communication interfaces shall be addressed in a clear and 
user-friendly format. 


Human Agency 


The AI system shall support the cancer prevention 
actions of the healthcare system, revising internal pro- 
cesses in light of such a support in the pre-screening 


Disclaimer on the system characteristics shall be included 
both in the Terms and Conditions and in the handbook. 


Awareness and training campaigns shall be promoted among 


activities. No risks of overreliance and overconfidence clinicians. 
in the AI system, as it gives just a prescreening 
information. Clinical diagnosis shall in any case be 
provided through gold standard methodologies. 
Human oversight Technical Answer Comments 


robustness and safety [23] 


Resilience to attack and 
security 


A monolithic system ensures a safer ecosystem. Re- 
garding AI models, specific technical methodologies 
may be implemented to ameliorate model robustness 
against external perturbations and attacks. A further 
layer of security might be implemented in the model 

itself by (1) using ad hoc methods enabling the use 
on encrypted data as input to certain models, and (2) 
adopting models and libraries optimized for privacy 

sensitive applications (e.g., Opacus) [24]. 


Stress tests shall be scheduled to assess the overall resilience 
of the system to attacks and breaches 
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Fallback plan and general safety A fallback plan aligned to the general backup policy 
is applied. As the learnability of a specific problem is 
uncertain a-priori, a rigorous scientific methodology 
should be applied to the evaluation of a solution per- 

formance. To gauge the level of uncertainty associated 
to a specific prediction of a specific model, and hence 
to mitigate the effect of identifiable unreliable predic 
tions, specific methodological approaches may be ] 
used (e.g. as reported in previous studies [25-28] 


Adding redundancy in the software and hardware compo- 
nents of the system may also be a viable and widely adopted 
strategy to boost the overall system safety and ensure prompt 

recovery from anomalies, also enabling internal and auto- 

matic system diagnostic e.g. as done in avionics systems or, 
regarding AI models, using ensemble of models. 


Accuracy In general, a specific AI algorithm architecture is used 
to solve a certain task and needs a specific metric to 
correctly evaluate its performances. The AL-controller/ 
developer is responsible to identify the proper statisti- 
cal method for the specific case of implementation. 
For instance, a single shot detector model is used for 
object detection problems and its performances are 
evaluated using mean average precision metric [29]. 
As an example, in the specific case of classification 
task, the tool accuracy is evaluated as the ratio be- 
tween the sum of True Positives (TP) and True Nega- 
tives (TN) values scored by the algorithm on the train- 
ing data and the total number of training samples. 
Other metrics, such as sensitivity (ratio between True 
Positives and all positive samples) and specificity (ratio 
between True Negative and all Negative samples), are 
computed to better understand the learning level of 
the implemented algorithms. 


In the current application, true Positives represent the num- 
ber of positive samples (i-e., patients with melanoma) that are 
scored by the tool correctly affected by the pathology. True 
Negatives represent the number of negative samples (i.e., pa- 
tients not affected by melanoma) the tool scores as healthy. In 
this field of application (i.e., melanoma detection), sensitivity 
is usually maximized at the expense of specificity. 


Reliability and reproducibility A specific monitoring system has been implemented 
to assess the algorithm performance. Code shall be 
extensively documented and it could be versioned 
using an open-source implementation for version 


management based on Git. 


Extensive and detailed documentation of the scientific 
methodology followed to build and validate a specific AI solu- 
tion should always be produced to ensure that independent 
reproduction and scrutiny of the technical implementation 
is feasible. When applicable, the source code of the specific 
implementation should also be made available at least for 
external technical review [30]. 


Privacy, and data governance Answer 


Comments 


Respect for privacy and data 
protection 


End-users' consent is the legal basis of the data pro- 
cessing provided with the tool. 

As a consequence, technical and organizational mea- 
sures shall be implemented to inform, and to let the 
data subject exercise her/his rights. 

For instance, the interface will allow to enable/dis- 
able both for algorithm continuous training and the 
decision-making process anytime. 

A data protection impact assessment is performed and 
risks for the availability, confidentiality, integrity of 
data shall be mitigated in terms of acceptability. The 
DPOs shall be involved in the process. 
Within the step of training of the AL-tool: Data gover- 
nance and ownership shall be governed in compliance 
with the applicable legal framework. Therefore, data 
shall be anonymous or, in case of pseudonymised 
ones in the frame of a collection provided directly 
from trials in the clinics, flows shall be encrypted, 
pseudonymisation techniques shall be applied and the 
involvement of the competent ethics committees shall 
be included in the process. 


Within this section, the ALassessment deals with all the 
GDPR compliance activities. 
These activities include the identification of the data gover 
nance, the security measures, and the identification of roles 
and responsibilities that ensure that all data processing are 
provided in light of the principles of lawfulness, fairness, 
and transparency, accountability, purposes limitation, data 
minimisation, and accuracy. 

For those AL-controllers/developers that are not processing 
personal data, they shall be compliant with the EU Regula- 
tion (EU) 2018/1807 of the European Parliament and of 
the Council of 14 November 2018 on a framework for the 
free flow of non-personal data in the European Union, that 
entered into force on 28 May 2019 [31]. 


Quality and integrity of data = The development of the tool is framed in a detailed 
privacy governance policy, following a tailored man- 


agement/authorization path. 


Data pooling activities shall pay attention to the possibil- 
ity that to process information by crossing anonymous and 
pseudonymised datasets may incur the risk of re-identifica- 
tion or, at least, the necessity to regulate data flows through 

specific data sharing agreements. 


Access to data Dedicated servers, with limited access, host the devel- 


opment of the tool and anonymised/pseudonymised 
data. 


The internal and external governance as well as technical 
measures shall be identified, implemented and maintained 
for the entire life-cycle of the data according to the results of 
the impact assessment. 
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Transparency [30] Answer 


Comments 


User is guided in the data gathering process (e.g., 
using viewfinder to centre the skin lesion, checking 
the blur of the image), ensuring a correct data quality. 
This is essential to guarantee the correct pre-process- 
ing phase, before feeding the algorithm using the 
user’s data. 

Decision-making outputs can be monitored using 
thresholds. As an example, in case of probability as 
output, data that gives as output probabilities within 
certain ranges, rising concern on the reliability of 
the algorithm output, are forwarded to a clinician, 
as described in the Communication requirement. 
These data paired with the clinician response (i.e., the 
ground truth or target) enlarge the original dataset 
that it is used to retrain the AI algorithm. Here, 
metrics described in the Accuracy section are used 
to evaluate the algorithm training, monitoring its 
performances. 


Traceability 


Traceability is functional to address the transparency of the 


AL-system in order to verify the quality of the decision-making 


outputs. AI algorithms are strongly dependent on the data 


format and quality used to train them to boost their learning 


ability and produce the desired result (in this case, detecting 
a malignant skin lesion). The format and quality must be 
similar to the ones the algorithm uses during the train- 
ing phase. AI algorithms can be considered “black boxes”, 
since they process data in ways that are not easily audited or 
understood by human. This limits the traceability of what 


happens within the algorithm. Nevertheless, strategies can be 


implemented to monitor the data quality and the algorithm 
performances (e.g. blockchain technologies) [31]. 


A viable inclusion of computer-aided design system in 
melanoma diagnosis pipeline was shown by Water- 
house et al. in 2019. The Al-decision may influence 

the screening activities, according to the user' experi- 


Explainability [32] 


ence, including clinicians. As an alternative, other 
recent methods tend to “open” black boxes models 
such as AI algorithms. Here the authors provided a 
classification of approaches to interpret the decision- 
making process based on the problem definition and 


the black box type [33,34]. 


The experience plays a crucial role in trusting an AL-based 
decision. Unexpert users tend to follow the results output 
by the AI tool. This is also true for clinicians. Indeed, it has 
been shown that clinicians with least experience in general 
follow the ALtool decisions if it contradicted their initial 


diagnosis, even if they were confident. However, faulty AI can 


mislead both experienced and least experienced clinicians. 
This is an aspect to consider when deploying these tools. 

This could be overcome providing additional features to the 

clinician (e.g., for the current application: asymmetry index, 
border irregularity index) that can aid their final decision. 


The interface shall be developed in accordance with 
the given task and the involved roles (clinicians or 
end users). 


Communication 


It has been demonstrated that for clinicians it is better to 
show multiclass probability when dealing with multiclass 
diagnosis. Conversely, malignancy probability can be used to 
manage binary decisions (e.g., whether or not do a biopsy, 
whether or not go visit an expert clinician). As described 
in the “explainability” requirement, unexpert user tend to 
follow the Al-results, even in the case of wrong suggestion. 
Melanoma growth happens within months, thus missing the 
detection of this lesion type can lead to adverse outcomes. 
The solely probability could be ambiguous, especially in case 
of intermediate values. In the effort of maximize the sensitiv- 
ity of the tool, giving clear responses to the user, the clinician 
himself/herself can be involved in this detection process. 
In case of probability/uncertainty, the lesion data could be 
forwarded to a clinician for an additional evaluation. The cli- 
nician decision is then used as the final response to the user 
as well as to improve the AI algorithm performances. Indeed, 
the new data, paired with the clinician response (i.e., the 
ground truth or target), enlarges the original training dataset 
and it is used to retrain the Al algorithm [35-37]. 


Diversity and non-discrimi- Answer 


nation 


Comments 


Biases are possible. Biases are mainly related to the 
personal characteristics that compose the training 
datasets. Melanoma incidence rate indeed is higher in 
those people with lower phototypes, while is almost 
rare in those with higher ones (e.g., black people). 
Datasets will thus be biased in the skin colour. 
Data gathering and assumptions made when training 
an algorithm can also introduce biases, thereby dis- 
torting the final output. These are important aspects 
to consider when dealing with automatic algorithm to 
prevent injustice and discriminations. As an example, 
this study showed that using the health costs as a proxy 
of health needs introduced a racial bias in the system, 
and the algorithm consider black patients healthier than 
equally sick white patients. This led to a reduction of 
28.8% in the number of black patients to be considered 
with high priority of health needs [38,39]. 


Unfair bias avoidance 


There are three types of bias: Productive bias, bias that 
someone would qualify as unfair, and discrimination bias. 
Algorithms cannot be unbiased. Bias in machine learning 

guarantees the success in modelling a certain distribution of 
data, according to the “No Free Lunch (NFL) Theorem”, thus 
solving the task of concern. The choice of cost function that 
the model must minimize to converge to the solution, the 
purpose and the use of limited training, and the test data are 
examples of productive bias. Also, the assumption that train- 
ing data distribution will be the same of the one of the test 
data repr(esents a production since in practice is often violated. 
However, solutions can be implemented in order to understand, 
mitigate or account for bias, as summarized in this survey by 
Ntoutsi. The discriminatory level of ML models can be limited 
adding constraints to the model optimization problem. Thus, 
it must be found a trade-off between constraints and accuracy 
(since it can worsen when adding to many constraints) [40,41]. 
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Societal and environmental 
wellbeing 


Answer 


Comments 


Sustainable and environmen- 


tally friendly AI 


One main weakness of using AI methods is tied to 
the high computational demand in terms of both 
hardware and energy. Hence, electronic and consum- 
ables replacement/disposal are involved Moreover, the 
type of model-task to be solved may require the use 
of especially complex and computationally expensive 
models (e.g. text based applications). In general, 
more complex models require higher expense in both 
economical, energetic and availability terms. 


A limitation of AI tools lies on the expensive hardware that 
is needed, especially for the training process of deep learning 
models. The use of GPUs is essential to increase the complex- 
ity of the model architecture, accelerating its learning process. 

The more GPUs are used, the more complex the task to be 
solved (and hence the implemented model) can be. The price 
of one good GPU is around $1,800 each and computational 
cloud resources may be rented. Moreover specifical technical 
methods may be adopted to reduce energy expenditure while 

only marginally affecting final performance [42,43]. 


Social impact 


The AI system contributes to the P5 medicine pur 
poses. 


As illustrated in the previous paragraphs, the effectiveness of 
these Al-tools is strongly connected to the clinician-patient 
relationship. The empowerment of patient’s awareness 
on cancer diseases is aligned with the public prevention 
purposes to enhance current society. This has also impactful 
consequences on the improvement of the healthcare systems 
and services. 

Reproducibility and transparency of the scientific methods 
applied to develop a given AL-based system boost innovation 
towards the mentioned societal challenges. 


Society and democracy 


The AI system contributes to the P5 medicine pur 
poses. 


Multiple and bottom-up solutions, that engage stakeholders, 
including vulnerable groups, may contribute to overcome 
societal barriers, reaching a more inclusive society. 
Trustworthiness of the given ALbased system constitutes a 
pre-condition to meet these challenges and enhance demo- 
cratic values. 


Accountability 


Answer 


Comments 


Auditability 


As stated in the Fallback plan and general safety re- 
quirement, adding redundancy to the AI method and 
continuous technical assessment could create a system 

of internal audit 


Also to schedule an independent audit to better test under 
common standards the developed technology could provide 
an accountable measure. The activity might be time-consum- 
ing and expensive, however the results are usually functional 
to align knowledge, competence, and skills useful to improve 
one’s approach towards innovation. 


Minimising and reporting 
negative impact 


In the current clinical scenario, considering the pos- 
sible different level of education, sensitivity of the 
end-users, other stakeholders might be involved in 

case of pre-identified vulnerabilities. 


A survey could be implemented in the ALtool interface in 
order to limit the access and use in case of specific vulnera- 
bilities. The same solution could be adopted for minors. The 
tool can be used by a legal representative in case of incapacity 
of the end-user. In other cases, the same AL-tool interface may 
suggest the user to not use the tool alone. 


Documenting trade-offs 


Within the tool development, an efficient manage- 
ment shall ensure a continuous monitoring of techni- 
cal activities and compliance ones. 


A trade-off analysis is part of the R@D&I life-cycle. The 
decision-making process provided by the Al-developer is 
continuously addressed to assess consequences to losing one 
quality, aspect or amount of something in return for gaining 
another quality, aspect or amount that shall in any case 
considered as trustworthy as the first one. Any decision shall 
be documented in order to be able to intervene if conditions 
change. 


Ability to redress 


Tests will evaluate the accuracy of the automated deci- 
sion making. In case of adverse impact occurs, specific 
mechanisms will ensure adequate redress. 


The user could be invited to self-evaluate the accuracy in 
terms of reproducibility of the obtained result. A survey 
could be implemented in the Al-tool interface to evaluate it 
and to report any possible adverse answer. 


The chart summarizes those answers provided during an ALTAI 
session aiming at evaluating the level of trustworthiness of a 
given technology in order to discuss possible technical and 
organizational measures to be implemented in an Al-based 
tool to reach an acceptable level of trustworthiness. A possible 
limit of the selfassessment approach consists of the fact that AL 
controllers/developers may encounter difficulties in explaining, 
in plain language, how a given tool works and its consequent 
functionalities. This is particularly evident in case of code 
interpretation, even between AL-developers, but also confirmed 
for different domain experts. This test of the ALTAI checklist 
constitutes a unique exercise of semantic alignment, awareness 
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development, and interdisciplinary training, strongly impacting on 
possible standardization of skills and competence involved in the 
AI compliance processes. 


As shown in the “comments” column of the chart, the checklist 
becomes a very useful tool for selfassessment only if it is 
accompanied by a series of good practices aimed at: 


i) Developing a common syllabus useful to align competences and 
skills among the different experts involved in the assessment. 


ii) Tailoring the ground of analysis to the specific sectors where 
the AL-ecosystem shall be performed (e.g., health sector, workplace, 
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mobility. 


iii) Addressing specific actions to firstly identify individual and 
group vulnerabilities, and then to overcome the relative barriers. 


iv) Addressing specific actions to pursue the mitigation actions 
beyond the development step in order to transfer also in the market 
context the trustworthy knowledge and know-how developed 
during the assessment. 


v) The technical board is updated to the highest standards 
applicable to the given sector/market where the AL-tool is placed. 


In addition, the selfassessment approach shall be promoted through 
policy-making incentives in order to ensure its application also 
without mechanisms of enforcement. In this regard, the Provisional 
Resolution specifies that they could constitute “a good starting 
point but cannot ensure that developers, deployers and users act 
fairly and guarantee the effective protection of individuals” and that 
in any cases, technology-neutral as well as specific standards shall 
be developed where appropriate. In particular, as far as artificial 
intelligence, robotics and related technologies are concerned, it 
suggests providing “mandatory compliance with legal obligations 
and ethical principles as laid down in the regulatory framework for 
AI” to be performed through “an impartial, regulated and external 
ex-ante assessment based on concrete and defined criteria”. 
Research and innovation, therefore, are those key-sectors where the 
ALTAI checklist or other impact assessment methodologies could 
find application. 


In this context, the illustrated steps will be included in every 
Altelated project life-cycle by design, becoming a significant 
component of the research integrity and reproducibility within the 
scientific methodology [23-47]. 


NEW REGULATORY CHALLENGES FOR ALIN 
THE HEALTHCARE SYSTEM 


In this paper, we provided an assessment of the ALTAI checklist 
considering the possible issues emerging while providing the 
evaluation for an AL-based tool applied to cancer medicine and, 
specifically, to early detecting the melanoma skin cancer. 


Firstly, the interdisciplinary approach that characterized the 
development of the checklist shall be applied also in the executive 
phase and maintained in the entre life-cycle of the technology 
development. The methodological outcome of the analysis is 
needed both to interpret and then to accomplish to the technical 
and organizational measures to be implemented as a consequence 
of the risk-based analysis. 


Secondly, considering that the abovementioned Provisional 
Resolution identifies, as high risk ones, the applications whose 
“development, deployment and use entail a significant risk of 
causing injury or harm to individuals or society, in breach of 
fundamental rights and safety rules as laid down in Union law”, the 
role of Al-based systems in P5 medicine shall require the inclusion 
of standardized notions, tailored risks, and mechanisms to ensure 
either the coherence among the sectorial legislative frameworks 
or the opportunity to re-assess their impact anytime the scientific 
progress could affect one of their fields of application. 


Thirdly, the frontiers of P5 medicine are significantly affected 
by the ongoing debate on the regulatory framework for Al. The 
opportunity to get access to health data and to reuse them for 
algorithms training purposes, as well as for providing further 
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information on a given patient, is crucial for innovation in terms 
of effectiveness and sustainability of the developed solutions. In 
addition, the chance to manage health data despite of the means, 
time, site of their collection is a strategy that could enhance the 
competitiveness of the related industrial sector and, at the same 
time, it could promote inclusiveness and awareness among citizens. 


To manage personal and non-personal data in compliance with 
the ethical-legal framework, therefore, constitutes a pre-condition 
to implement innovative tools and solutions in healthcare, that 
could enhance diagnosis and treatment and, at the same time, 
human dignity through the empowerment of the patient control 
over her data. A new regulatory framework on AI, indeed, shall 
provide specific instructions on how to develop per se compliant 
hosting infrastructures and it shall address the proper consistency 
mechanisms to ensure that any data processing could be mapped, 
assessed, and enabled for predetermined purposes. Human 
dignity and fundamental rights shall be protected by design and by 
default with concrete action points that could be understandable, 
applicable, and enforceable in every RG D&I context. 


CONCLUSION 


Thus, to promote a system of certification could represent a valuable 
solution to balance the need of procedures standardization, 
monitoring, and control of the level of compliance. Moreover, 
a multilevel system of enforcement, based on the accountability 
principle, and then on liability, could facilitate the establishment 
of a fruitful dialogue between developers and users, aiming at 
consolidating trust and awareness among citizens and stakeholders 
towards such a technological, ethicallegal revolution that AI 
brought in our society. 
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